Software and Computational Tools for LC-MS-Based Epilipidomics: Challenges and Solutions

ACCESS Metrics & More Article Recommendations ■ CONTENTS LC-MS-Based Epilipidomics Data Analysis 290 Feature Detection 290 Lipid Identification 291 Lipid Quantification 291 Results Investigation 292 Current GUI-Based Computational Tools to Address Epilipidomics Challenges 293 Dealing with Low Abundance 293 Dealing with High Structural Diversity, Native Lipid Isobars/Isomers, Unique Fragmentation 293 Dealing with Nomenclature 294 Databases Containing Epilipids 295 Emerging Tools for Epilipidomics 296 Metabolic Network Analysis 296 Molecular Networking 296 Machine Learning-Based Tools 297 Mass Spectrometry Search Tool and Mass Spectrometry Query Language 297 Summary and Outlook 298 Author Information 299 Corresponding Authors 299 Authors 299 Author Contributions 299 Notes 299 Biographies 299 Acknowledgments 300 References 300

L ipids play a crucial role in cellular structure and functions, including cell signaling, membrane plasticity, and trafficking. Alterations of the lipid composition in cells, tissues, or organelles have been associated with a large number of diseases, including inflammation, cancer, and degenerative or metabolic disorders. 1−5 Within biological systems, lipids can undergo a range of enzymatic and nonenzymatic reactions (e.g., oxidation, nitration, sulfation, and halogenation) that introduce structural modifications and/or new functional groups to the native molecule. 6 The resulting modified lipid species, generally referred to as "epilipids", are known to be heavily involved in the regulation of physiological and pathological conditions. 7−16 Several analytical approaches have been applied to lipids analysis and have been extensively reviewed elsewhere. 17−19 Among them, high-resolution mass spectrometry (MS) currently represents the most popular technique due to the unprecedented sensitivity, mass accuracy, and resolving power offered by modern MS instruments. 20 In addition, separation techniques such as liquid chromatography (LC) can be conveniently coupled to the MS analyzers to attain a prior sample separation that reduces ion suppression phenomena and increases signal-to-noise ratio for low-abundance analytes. 21, 22 Moreover, the recent development of MS devices with ion mobility spectrometry (IMS) can support further characterization of the measured lipid species and increase the identification confidence in untargeted studies. 22 Despite these technological advances, only a few epilipid classes (e.g., eicosanoids and docosanoids) 8,23 have been extensively studied over the past decade, and our global understanding of the "epilipidome" (bio)chemistry is still very limited. 15, 24 The main obstacle to a comprehensive epilipidome analysis and annotation arguably lies in its intrinsic chemical complexity, which causes a series of both analytical-and computationalrelated challenges: (1) Low abundance. Epilipids generally occur with an inherently low abundance and transient nature within biological systems (e.g., the absolute amounts/concentrations of lipid oxidation products in vivo are estimated in the order of 0.03−3.0 mol % of the total nonoxidized lipidome). 25 As a consequence, despite the exceptional sensitivity reached by modern MS detectors, the signal produced by modified lipids is often close to the instrumental limit of detection. 16 This represents a major obstacle to the use of shotgun and/or MSimaging methods in epilipidomics studies as they suffer from low overall sensitivity due to scarce ionization efficiency of lowabundance metabolites. Hyphenated techniques (e.g., LC-MS) not only provide an additional separation dimension but also can increase the overall dynamic range by reducing ion suppression effects. For these reasons, LC-MS currently represents the technique of choice for the study of epilipids.
(2) High structural diversity. The nature and likelihood of lipid modifications are largely determined by the molecule's chemical structure. In fact, most of the reactions occur at energetically favorable sites of the molecule. 26 For example, double bonds are deemed the most susceptible sites to electrophilic addition in polyunsaturated fatty acids (PUFAs), while allylic and bis-allylic positions are prone to radical attack. In addition, chemical modifications may also involve the lipid headgroup, vinyl ether moieties, and ester functional groups. 27−29 Considering the lipids' remarkable structural diversity, the large number of potential modification sites and the range of potential reactions (e.g., oxidation, nitration, sulfation, etc.), an enormous number of chemically distinct derivatives is possible. 26 In this regard, Ni et al. estimated the epilipidome size by knowledge-based and systematic enumeration, only considering a few main oxidative modifications ( Figure 1). 16 Indeed, if an algorithm tries to enumerate epilipids with complete permutations with repetition of all modifications at all possible allylic and bis-allylic positions (approach A in Figure 1), a vast search space with over 10 12 species results. This space includes a large number of unnatural structures. In contrast, if knowledge-based algorithms are used to estimate only actually possible modifications reported for enzymatic and free radical based mechanisms of fatty acids (FAs) (approach B in Figure 1), a more biologically relevant estimation of possible product space with less than 10 5 enumerated structures will result. Nevertheless, the scheme in Figure 1 well represents the high structural diversity within the epilipidome and makes clear that the generation of a comprehensive database of epilipids to be used for identification purposes (a common approach for native species) is hardly applicable.
(3) Large number of isomers/isobars. A direct consequence of the vast chemical space covered by the epilipidome is the significant number of isobaric (same nominal mass) and isomeric (same exact mass) species. As an example, querying the exact mass 649.40 with a tolerance of ±0.05 Da in the LIPID MAPS Structure Database (LMSD) 32 Figure 2A). Correspondingly, searching a specific elemental formula within the same database returns multiple isomers with the same exact mass but different chemical structures (P-IsoPGE2-PC [LMGP20010043], P-L G E 2 -P C [ L M G P 2 0 0 1 0 0 4 9 ] , a n d P -L G D 2 -P C [LMGP20010050]; Figure 2B). High-resolution MS systems can nowadays provide high mass accuracy and sufficient resolving power to distinguish isobaric species; however, the problem persists for structural isomers, which exhibit identical exact mass and isotopic pattern. In this case, a confident annotation can be achieved only by inspecting the collected fragmentation spectra. In this example, a lipid species which is not reported in common databases such as SM(d18:1/20:4) can be constructed and selected as a precursor for oxidation, aiming at enumerating all possible modifications at all possible positions. (B) Knowledge-based enumeration of only actually possible modifications of fatty acid residues on certain sites reported for enzymatic and free-radical based mechanisms. As an example, the widely reported oxylipin products from arachidonic acid can be used to predict oxidation addition products from commonly reported phospholipids with arachidonic acid residue such as PE(18:0/20:4) and generate products with oxylipin residue such as PE(18:0/15-HETE) and PE(0:0/PGE2) which are confirmed by literature. 30 (4) Dif ferent f ragmentation compared to native lipids. Fragmentation mass spectra (also referred to as MS/MS spectra) can be used as "molecular fingerprints" to assign a tentative annotation to the unknown detected metabolites. 33 Reference libraries of MS/MS spectra can be used to automatize the metabolite annotation using a spectral matching approach (i.e., experimental MS/MS data are compared with the reference spectra and the annotation assigned based on the best match). Although this currently represents the most effective approach for metabolite annotation, 34,35 two main obstacles severely limit its application to epilipids. First, comprehensive collections of commercially available analytical standards are currently lacking for these species. This prevents the acquisition of reference MS/MS spectra from the pure compounds and their integration in existing libraries. Second, the fragmentation patterns of epilipids may significantly differ from those of parent molecules. In fact, even "small" modifications in the lipid chemical structure can produce substantial differences at the MS/MS fragmentation level. For instance, both 5-hydroxy- PGI1 and 6-keto-PGF1α originate from 5Z,8Z,11Z,14Zeicosatetraenoic acid (arachidonic acid) and even share the same elemental composition (i.e., C 20 H 34 O 6 ); nevertheless, they produce significantly different fragmentation spectra. 36 As a consequence, MS/MS libraries of native lipids cannot be effectively exploited for the annotation of modified species as it would produce unreliable results.
(5) Nomenclature. Lipid nomenclature has historically represented an arduous task due to the lipidome's structural diversity and complexity. 37−40 A systematic shorthand notation for MS-based lipid structure annotation has evolved in the past two decades to allow their unequivocal reporting in a clear and succinct way. 41,42 However, despite the recent efforts of LIPID MAPS, 37,38 a unified nomenclature scheme for modified lipid species throughout all the lipid categories is still lacking. As a consequence, improper annotation and over-reporting is commonly found among research papers. 43 As an example, 1-palmitoyl-2-(9-oxo-nonanoyl)-sn-glycero-3-phosphocholine (LMGP20010008) has been reported with at least six different, but similar names in recent publications. 44−49 Although these names may be very similar and/or differ only in one character, a simple string matching would fail when parsing lipid names (e.g., querying databases). This reduces the usability of reference data and complicates the unified computational treatment of lipid names. 40,50 Common abbreviation systems have been adopted by the community only for a few oxylipin species (e.g., 15-HETE, 12(R)-HpETE, and PGE2). 51−53 However, these abbreviations carry almost no structure-related information and require the researcher to possess prior knowledge for their correct interpretation.
Although the above-mentioned challenges are currently hampering the study of the epilipidome through MS-based methods, they can be addressed (or at least mitigated) by software-assisted data analysis pipelines. Dedicated tools have started to be available; yet, the information on the existing solutions is rather unsystematic, hindering their large-scale implementation by the community. In this context, the present review focuses on the computational tools and approaches currently available for the analysis of LC-MS epilipidomics data. In the following sections, a general epilipidomics LC-MS workflow is described with a focus on the data analysis steps as well as the challenges that computational tools are expected to overcome (analytical approaches for epilipids analysis were recently reviewed elsewhere 54,55 ). We placed our emphasis on GUI-based software as they provide more usable solutions for researchers lacking programming skills. 56 Finally, emerging computational tools that we believe can support the analysis of LC-MS epilipidomics data are described.

■ LC-MS-BASED EPILIPIDOMICS DATA ANALYSIS
In this section a general workflow of an LC-MS-based (epi)lipidomics study including all data processing steps is described ( Figure 3). We divided the data analysis stage into four main steps: feature detection, identif ication, quantif ication, and results investigation. A detailed description of each of these steps is provided below. Emphasis is placed on how the abovediscussed challenges translate into computational problems that prevent the ready implementation of lipidomics-optimized pipelines into epilipidomics workflows.
Feature Detection. In LC-MS-based omics experiments, the term feature typically refers to a two-dimensional signal (i.e., peak) that represents a chemical compound. The first step in untargeted LC-MS data processing is normally the "feature detection" (a.k.a. "peak picking"), where boundaries (m/z and retention time values) and intensities are determined for all the features detected in the analyzed samples. 57 The final goal is to turn the complex LC-MS raw data into a list of detected features ready for further downstream data analysis. 57,58 Feature detection normally involves several consecutive substeps, including extracted ion chromatograms (EICs) construction and deconvolution, feature filtering (e.g., deisotoping), alignment, etc. We identified two of these substeps as potential pitfalls when dealing with epilipidomics data.
First, during the EIC building step, the user is normally asked to set a noise level threshold which determines the minimum signal intensity (i.e., area or height) for an LC peak to be retained as a feature. Intuitively, this threshold has to be chosen with particular care in epilipidomics applications, due to the natural low abundance of modified lipid species. In fact, an overly high noise cutoff will lead to the erroneous discarding of relevant signals. On the other hand, an overly low noise threshold unavoidably produces an exponentially larger number of background signals to be retained in the final data table, which extends the computation time/cost and might complicate downstream processing steps. This problem can be mitigated by fine-tuning the feature detection algorithm parameters and by setting additional constraints (e.g., minimum number of data points, signal/noise ratio, etc.) for an LC signal to be considered an actual feature. 59−62 Second, untargeted LC-MS analyses typically rely on generic chromatographic column chemistries and gradients. Under these conditions, positional isomers often exhibit similar chromatographic behavior, producing coeluting and/or shoulder peaks. The EICs deconvolution step aims at correctly splitting neighboring LC peaks that do not present a full baseline separation. Although robust algorithms have been developed for this purpose, 60,62−64 manual tuning and visual inspection of the results is usually needed to ensure results fidelity. However, this requires the user to have a good understanding of the (often several) algorithm parameters to be set, since even little changes in these settings have been shown to have a dramatic impact on the results. 65,66 Lipid Identification. Lipid annotation is unanimously recognized as the primary bottleneck in (epi)lipidomics studies. 16 MS-based identification of lipids can be carried out at six levels of structural information as described by Liebisch et al., 37 with the highest achievable level being the "fatty acyl/ alkyl/sphingoid base structure". As for the annotation of small molecules in general, assignment of the elemental formula to the (unknown) m/z signals of interest is the first committed step toward its structural identification. 67 As discussed above, the epilipidome covers a remarkably vast chemical space over a relatively narrow molecular weight range, which inevitably leads to the ample occurrence of isobaric species. Therefore, the sole exact mass match is usually not sufficient for an unequivocal elemental composition assignment as many candidates would fall within the mass range search. In this respect, the isotope pattern information represents an irreplaceable means to rule out incorrect candidate formulas and raise the annotation confidence. 68 The next step is the tentative annotation of the chemical structure of (epi)lipid based on the collected MS/MS data. The fragmentation spectra are either compared to reference spectral libraries or annotated using a rule-based approach. 16,69 However, epilipids are currently underrepresented in reference MS/MS spectral libraries (see Databases Containing Epilipids) due to the scarcity of commercially available standards. To circumvent the need for comprehensive reference standard sets, libraries of MS/MS spectra can be simulated in silico, using wellcharacterized fragmentation rules (i.e., class-specific fragment ions and neutral losses) and subsequently matched against the experimental spectra. 70,71 Regardless of the annotation strategy, multiple structure candidates (especially structural isomers) are normally returned. 72 In this regard, additional separation dimensions prior to the MS acquisition can provide valuable information to raise the annotation confidence. For instance, a chromatographic separation can avoid coeluting isomers being cofragmented and prevent the generation of chimeric MS/ MS spectra. Annotation constraints based on RT and/or IMSderived collisional cross section (CCS) information can be included in the identification strategy to filter out incorrect candidate structures and reduce the false discovery rate. 72,73 Lipid Quantification. Following the identification step, the analyst is often interested in the absolute or relative quantification of the annotated metabolites across the analyzed samples. 74 Targeted applications rely on calibration curves and/or the addition of internal standards for the accurate quantification of the analytes of interest, 75 although this is clearly limited to those molecular species for which analytical standards are available. A semiquantitative approach can be pursued by using a single internal standard per lipid class. 76 In the case of modified lipids, using analytical standards from the same subclass species (ideally the same type of modification) could represent a viable alternative. In practice, untargeted studies do not normally require absolute quantification levels and the measurement of the analyte abundance is simply based on the chromatographic peak area or height. 77 In this regard, it must be taken into account that a given analyte can generate multiple ion species (e.g., [M + H] + adduct, [M + Na] + adduct, in-source fragments, etc.) that will be recognized as distinct features due to the different parent m/z. As a result, the overall signal will be distributed over multiple entries in the final feature list. Therefore, relying on a single adduct for the feature quantification could introduce biases in the downstream analysis. The number and type of generated adducts do not depend only on the chemical structure and characteristics of the individual molecule, but also on experimental variables difficult to control (e.g., solvent purity). Moreover, certain structural modifications can change the ionization behavior and efficiency with respect to the native species (e.g., a different adduct distribution can be favored). 44,78 For instance, the majority of oxidized phosphatidylcholines (oxPCs, e.g., LMGP20010003 and LMGP20010005) are generally detected as formate or acetate adducts, while truncated oxPCs with terminal carboxylic moieties (e.g, LMGP20010006 and LMGP20010007) tend to form [M−H] − adducts. 44 Algorithms for the automatic grouping of multiple ion species produced by the same chemical entity can address this problem. In the specific case of epilipids, tools able to "dynamically" pick the correct adduct species for quantification would be highly desirable.
Results Investigation. One of the ultimate goals of untargeted LC-MS data analysis pipelines is to enable an informative visualization of the annotation and quantification results. Various data displaying strategies have been proposed over the years to inspect identification results, spot misannotations, etc. In this section, we report two data visualization tools developed in different research fields and later adapted to (epi)lipidomics applications: Kendrick mass defect and Circos plots.
The Kendrick mass defect (KMD) plot is a graphical data representation designed to assist the identification of compounds that include repeating units in their chemical structures. 79,80 The KMD plot enables the representation of the whole LC-MS data set in a single chart, where compounds containing repeated structural features form easily recognizable homologous series. Briefly, a repeating unit of interest (e.g., CH 2 ) is chosen, and the corresponding Kendrick mass factor is computed as the ratio between its nominal and exact mass. This factor is then used to create the Kendrick mass scale and calculate the Kendrick mass defect. The KMD plot is then generated by plotting the KMD of the selected structural feature against the m/z or the Kendrick mass. More details about "Kendrick analysis" in general can be found in Fouquet, 2019. 81 Although originally developed for petroleum analysis, 79,81 KMD(CH 2 ) and/or KMD(H) can be used to highlight differences in the acyl chain length and saturation of homologous lipid species. 82−85 Korf et al. created a KMD plot by plotting two KMDs against each other (i.e., CH 2 on the x-axis and H on the y-axis) and used it to identify more than 100 lipid species inChlamydomonas reinhardtii. 82 In the case of epilipids, the specific modification can be used to identify modification products of a lipid class in complex samples. For instance, Helmer et al. plotted the KMD(H) against the KMD(O) to display cardiolipin oxidation products analyzed via LC-MS ( Figure 4A). 83 The chromatographic information (i.e., RT) can be also integrated in the KMD plot in the form of a color-coded scale to enable further visualization possibilities. 83 The resulting three-dimensional KMD plot can be particularly useful to confirm identification results and/or recognize false annotations. In fact, clear class-specific trends are often observable depending on the applied chromatography (i.e., hydrophilic interaction or reverse-phase chromatography). 82,83 The Circos plot is a visualization approach for displaying complex data using highly customizable, information-rich circular layouts. More details can be found in Krzywinski et al. 86 Although originally created for genomic data visualization, 86 Circos diagrams can be adapted to other data types and have recently found applications in many different fields, including epilipidomics. For example, Jha et al. employed the Circos plot to visualize 96 hepatic lipid measured in 385 mice livers and their fold change between control and treated groups. 87 Multilayered annotations were also added to highlight the significant intergroup correlations, including the strength and approximate chromosomal position of the identified quantitative trait loci. Besides generic packages designed to create Circos plot with any type of data, 88,89 Ni et al. developed the Python package LipidCircos for the generation of Circos charts specifically dedicated to epilipidomics data. 44 In particular, LipidCircos intends to display and highlight relationships between identified epilipids and their corresponding native precursors, as well as overlaying customized information such as quantification data across samples and/or time points. 44 In the publication, the authors showcased the package capabilities by generating a Circos plot to display oxidized phospholipids and their corresponding native species, identified in rat primary cardiomyocytes treated with peroxynitrite donor SIN-1 over 16 h ( Figure 4B).

TO ADDRESS EPILIPIDOMICS CHALLENGES
Several freely available tools allow the processing and analysis of LC-MS (epi)lipidomics data. Various R 90−92 and Python 93−95 packages have been developed in the past few years and constitute an important resource for epilipidomics applications. The majority of these packages provide a command-line interface only, which require researchers to be comfortable with coding for proper use. However, a common reality is that most lipidomics researchers lack even basic programming skills, and this limits a wide adoption of these tools by the community. 56 For this reason, we focused the present review on prepackaged software with a user-friendly graphical user interface (GUI) as they represent a more accessible solution for researchers lacking coding expertise.
Our literature survey retrieved a total of six GUI-based software suitable to address various stages of the LC-MS epilipidomics data analysis workflow: Lipid Data Analyzer (LDA), 34,96 Lipostar, 60 LipidMatch Flow 97 (recently introduced as an update to the LipidMatch R package 91 ), LPPtiger 2, 44 MS-DIAL 4, 98 and MZmine 3. 99 In addition, although not a data processing software per se, LipidLynxX 40 was also included in the review as it is one of the lipid nomenclature harmonization tools, which has already integrated specific optimization for epilipids. Lipostar is proprietary but free for nonprofit institutions, while the other reviewed software packages are open source and freely available. The usability of these software for LC-MS-based epilipidomics data analysis is summarized in Table 1 and discussed below. In particular, the table highlights the data analysis steps that can be performed, along with the above-mentioned five challenges that can be addressed (at least partially) by each software.
Although five of the tools listed in Table 1 (LDA, Lipostar, LipidMatch Flow, MS-DIAL 4, and MZmine 3) were not specifically designed for epilipidomics applications, they provide the user with a range of algorithms to cover various steps of the data analysis workflow shown in Figure 3, as well as the flexibility to finely tune most of the processing parameters. In contrast, LPPtiger 2 was explicitly developed for the identification of oxidized lipid species; oxidized phospholipids (oxPLs), diacylglycerols (oxDG), triacylglycerols (oxTG), and cholesteryl esters (oxCE) are supported at the time of writing.
Dealing with Low Abundance. As explained above, epilipids often produce signals close to the noise level due to their natural low abundance. Tunable feature detection algorithms can reduce the risk of erroneously discarding relevant signals while ensuring a reasonable computation burden. Four of the software listed in Table 1 (LDA, Lipostar, MS-DIAL 4, and MZmine 3) include standalone algorithms for feature detection and allow the user to adjust the processing parameters toward low-abundance signals. In contrast, LipidMatch Flow does not provide a built-in peak picking algorithm and relies on MZmine 2 63 for the feature detection. 91,97 A different approach is used by LPPtiger 2, which does not require a prior feature detection to be carried out. Instead, the software predicts oxidized lipid species starting from a user-provided list of unmodified molecules of interest and searches for the corresponding MS/MS spectra directly in the raw data.
During the feature detection process, there are a number of situations where a feature might not be detected in one (or more) samples even though an LC peak is actually present. In these cases, a zero-intensity value is (erroneously) assigned, producing so-called "missing values" in the aligned feature table. 100 This is generally caused by suboptimal parameter settings, such as overly high noise threshold, inconsistent chromatogram resolving, misalignment, etc. In this context, Lipostar, MS-DIAL 4, and MZmine 3 offer the possibility to automatically reinspect the aligned feature table to cope with false missing values that are artifacts of the processing. 60,98,99 In these tools, the algorithm examines each missing value in the feature table individually and checks for the presence of omitted chromatographic signals in the original raw data where the peak is expected (i.e., RT window associated with the examined feature). If a meaningful LC peak is found, it is integrated and the retrieved peak area used. This approach (often referred to as "secondary feature detection" or "gapfilling") can significantly reduce the presence of missing values in the final feature table, providing more accurate quantification data suitable for a robust downstream statistical analysis. 99 Dealing with High Structural Diversity, Native Lipid Isobars/Isomers, Unique Fragmentation. The epilipidome structural diversity, the frequent occurrence of isobaric/ isomeric species, and the specific fragmentation patterns of certain modified lipid classes are all hurdles that directly hamper the annotation of both known and unknown (epi)lipids. For the sake of convenience and to avoid redundancies in the text, these challenges will be discussed together in this section.
As discussed in Lipid Identification, a tentative chemical structure is assigned to the detected features based on the collected MS/MS data. Annotation based on spectral matching Analytical Chemistry pubs.acs.org/ac Review (i.e., experimental spectra matched against a reference MS/MS library) represents the most popular approach in MS-based lipidomics 16 and is offered by all the reviewed software, except LDA, which uses an approach based on customizable mass lists. The reference MS/MS library can be already integrated in the software package, in house-curated by the user, or generated in silico using a rule-based approach. Concerning the latter, class-specific fragmentation rules formulated from experimental data are considered in the generation of the in silico MS/MS spectra. All tools come with a set of class-specific fragmentation rules that can be variably customized by the user (i.e., LDA, MZmine 3, and Lipostar allow the user to modify and/or add new fragmentation criteria through the GUI). Intensity-free spectra can also be computed by MZmine 3 and LPPtiger 2, while intensity relationships between fragments can be defined in LDA and MS-DIAL 4 and may help in assigning the sn-position of acyl chains. Lipostar also generates intensity-free spectra but allows the user to associate weights to the fragment ions and take them into account for the spectral match scoring. RT-and CCS-based constraints can be included in the identification workflow to raise the annotation confidence. 22 MZmine 3, MS-DIAL 4, Lipostar, and LDA enable the assignment of specific RT windows to the different lipid classes, which can be used to avoid incorrect annotations. Essentially, features annotated as belonging to a certain lipid class, but eluting outside the suspected window, are discarded. CCS-based annotation constraints can be defined in MS-DIAL 4 and Lipostar. RT-and CCS-based filtering options are not available in LPPtiger 2 and LipidMatch Flow. Notably, MZmine 3 also allows discarding noisy features and/or false annotation based on the chromatographic peak shape (e.g., tailing factor) and additional filters (e.g., Kendrick mass defect). 85 The differences in the annotation workflow custom-izability among the reviewed software are summarized in Table  2.
Modified lipids are currently underrepresented in reference spectral libraries (oxPLs and oxTGs represent the only epilipid classes covered in public reference MS/MS libraries at the time of writing). Therefore, some packages (i.e., Lipostar and LPPtiger 2) also offer the possibility of generating in silico MS/ MS spectra starting from a user-provided database of lipid chemical structures.
Notably, Lipostar provides an alternative annotation approach that aims at overcoming the lack of wellcharacterized fragmentation rules for the majority of epilipid classes. In particular, following a first identification of native unmodified lipids via spectral matching, potential oxidized species are searched among the features in the data set that remained unannotated. The software starts from the database of native lipids structures and calculates the theoretical exact mass of the oxidized forms based on a list of user-specified modifications (e.g., M + O, M + O − 2H, etc.) and adduct type(s). Subsequently, the unknown features' m/z are matched against the calculated oxidized lipid theoretical masses. When a match is found, the MS/MS spectrum is checked, and fragment ions exhibiting the same shift as the native-oxidized forms are taken into account in the spectral match. Moreover, a direct connection between LPPtiger 2 and Lipostar has been recently established with the aim of exploiting the benefits of both software (see Figure 5). Specifically, Lipostar can be used to perform the feature detection and a "pre-identification" (based on MS and MS/MS data) of potential oxidized species in the raw data. Such information is then used by LPPtiger 2 in its multiscoring annotation workflow. 44 By doing so, significant computation time can be saved while consistent annotation accuracy is ensured.
Dealing with Nomenclature. Five levels of information can be used in reporting modified lipid species: (i) the "mass  Analytical Chemistry pubs.acs.org/ac Review shift level", indicating the mass shift from the parent molecule (e.g., +16 indicates a 16 Da shift in the nominal mass, due to an hydroxylation for example); (ii) the "elemental composition level", pointing out the change in the elemental composition with respect to the parent molecule (e.g., the number of additional oxygens); (iii) the "type level", which relates to the type of chemical modification (e.g., two OH groups or one OOH group); (iv) the "site level", referring to the position of the modification site on the parent molecule; and (v) the "stereochemistry level", where information about stereochemistry is provided (e.g., 15R or 15S for OH on HETE). As discussed above, although guidelines for the shorthand reporting of modified lipid species have been recently introduced, 38 different styles and information levels are used by the reviewed software to annotate the same species and/or modification. For instance, LDA can annotate oxidized species at the "type level" or the sn-position and the software reports them with an "ox" prefix. However, no specific nomenclature is used to discriminate between oxo, keto, epoxy, and furan modification (i.e., they are all reported as "−O−"). Lipostar and MZmine 3 provide annotation at the "elemental composition level" (e.g., PC 34:2 + 2O). If the specific acyl chain that underwent the oxidation can be determined from the MS/MS data, Lipostar reports the information in the identification summary. LipidMatch Flow uses an "elemental composition level" annotation and can specify the FA residue(s) object of the modification; three different annotation styles are available: OxPC(16:0_18:2(2O)), PC 16:0_18:2;2O, and OxTG(14:0_14:0_18:2(OHOH)). Finally, LPPtiger 2 reports the type of modification ("type level" annotation, e.g., PC(16:0_18:2<2OH>). Such inconsistency in reporting epilipid species limits the establishment of automated pipelines for parsing and querying lipid names across different resources (e.g., databases, software), reduces the usability of reference data, and leads to over-reporting in research papers. In this context, the recently developed LipidLynxX software aims at providing a unified nomenclature system compatible with different identification levels and that is easy-to-understand by both researchers and computer scripts. The epilipids nomenclature scheme proposed by LipidLynxX relies on a multitiered identification matrix reported in Table 3. Multiple representations of the same lipid species are avoided by means of a controlled vocabulary and an ordered list of modifications. Different abbreviation styles can be interpreted and converted into a shorthand-compatible nomenclature (e.g., Lipostar can be linked to LipidLynxX for automatic nomenclature conversion). Finally, LipidLynxX also provides a linker module to cross-link lipid abbreviations to a collection of available online databases.

■ DATABASES CONTAINING EPILIPIDS
Databases normally play a central role in any computational method and/or workflow. In this section, we review those databases that currently contain relevant information regarding epilipids.
Generic metabolite databases such as METLIN, 101 Mass-Bank, 102 and the MassBank of North America (MoNA) 103 constitute a valuable resource to assist (epi)lipid annotation as they contain both experimental and in silico-generated MS/MS data for a number of oxidized lipids. Other generic databases such as ChEBI 104 and PubChem 105 contain some epilipids but were not optimized to filter or search for epilipids specifically, The software combines three shorthand nomenclatures (B = bulk level; M = molecular species level; S = sn-specific level) with five levels of modification information (0 = no modification; 1 = modification mass shift; 2 = modification elemental composition; 3 = modification type; 4 = modification position; 5 = modification stereochemistry) into a combined matrix (e.g., B2, M3). The majority of modified lipid abbreviations can be assigned using this matrix. As an example, the M3 annotation level (i.e., molecular species + modification type) PE(16:0_20:4<2OH,oxo>) indicates that the modification type is known, but its position on the FA residues is not. The matrix can be further extended with sn position and modification position information, leading to the highest annotation level (i.e., S5). The double bond position and cis/trans information can be also added to further extend it to sublevels (i.e, S5.  Annotation levels: "type", the modification type level, e.g, two OH; "site", the modification site level, e.g, OH at position 5 and 12; "R/S", the modification stereochemistry level, e.g, one OH on position 5 is 5R and another OH on position 12 is 12S. Analytical Chemistry pubs.acs.org/ac Review however, cross-linking from other lipid databases (e.g., LIPID MAPS LMSD, 32,106 SwissLipids 107 ) is available for data integration. A list of databases where structures of epilipids are collected and easily accessible is provided in Table 4. LIPID MAPS (which stands for LIPID Metabolites and Pathways Strategy) is arguably the most complete and widely used gateway for lipidomics. 32,106,108 Established in 2003, it includes databases of known lipid chemical structures (LMSD), 32 in silico-generated structures (i.e., LIPID MAPS In-Silico Structure Database), computationally generated oxidized phospholipid species (including the corresponding precursor ion's m/z), experimental CCS values (i.e., LIPID MAPS Ion Mobility Database), as well as a repository of experimental lipidomics data sets. 108 Notably, the LMSD also contains nomenclature, references, cross-links to other databases, and experimental MS/MS spectra for several lipid species. At the time of writing, the LMSD includes over 250 oxPLs, 1000 octadecanoids, 1200 eicosanoids, and 1100 docosanoids (accessed September 2022).
SwissLipids is a curated collection of known lipid structures and related information about metabolism, interactions, as well as subcellular and tissular localization. 107 All the information is curated from peer-reviewed literature. The set of known lipid structures is also complemented by a library of theoretical structures obtained by the combination of known building blocks from the curated set. Both known and theoretical lipids are organized into a single common hierarchy that follows the notation for mass spectrometry-based lipidomic data proposed by Liebisch et al. 37,38 and is consistent with the classification developed by LIPID MAPS. Concerning modified lipids, a collection of octadecanoids (13), eicosanoids (136), and docosanoids (13) is included, and a few complex epilipids are included in the database at the time of writing (accessed September 2022).
The Human Metabolome Database (HMDB) is a freely accessible database containing information about small molecule metabolites found in the human body. 109 It includes a large number of both confirmed and predicted lipid structures. At the time of writing, the HMDB includes 480 eicosanoids species and over 1000 predicted oxPLs.
The Riken IMS oxidized phospholipid database contains the MS/MS spectra acquired in negative ion mode of 386 total molecular species of oxPLs obtained by biogenic conversion from oxidized FAs incorporated into cellular phospholipids. More details can be found in Aoyagi et al. 31 Notably, LIPID MAPS LMSD, SwissLipids, RIKEN IMS oxidized phospholipids, and HMDB are downloadable and can be conveniently connected to different software packages to assist the (epi)lipid annotation pipeline (e.g., Lipostar allows the direct import of the LMSD). 60 Other lipid structure databases (e.g., LipidPedia, 110 LipidBank 111 or CEU mass mediator 112 ) are available only online. Notably, CEU mass mediator 3.0 (CMM 3.0) integrates compounds from different sources (e.g., HMDB, LIPID MAPS, KEGG, Metlin, etc.) and offers a module to annotate oxidized lipids. 112 LipidBank is the official database of the Japanese Conference on the Biochemistry of Lipids and contains molecular structures, spectral data (MS, NMR, etc.), and literature information for more than 7000 unique lipid species. Among them, 329 eicosanoids and 5 lipid peroxides are present (accessed September 2022). LipidPedia is a database of biomedical information for over 4400 lipid species retrieved from the literature using text-mining strategies with more than 1,500,000 associated literature references (accessed September 2022). At the time of writing, searching the term "oxidised" in the database returns only 6 six entries, although comprehensive information (e.g., classification, biological functions, biomedical data, etc.) is provided.

■ EMERGING TOOLS FOR EPILIPIDOMICS
In this section we review emerging computational strategies that can further assist the analysis and annotation of epilipidomics LC-MS data. These methods were originally developed for different applications (e.g., computational biology or metabolomics) but are finding a wider use in epilipidomics studies.
Metabolic Network Analysis. Metabolic network analysis is a powerful visualization tool widely used in computational biology. It can be used to display qualitative and quantitative changes in the lipids profile at a system level. A metric particularly interesting for epilipidomics applications is the change of (partial) correlations between pairs of species under different experimental conditions. The intuition here is that a reaction connecting two (epi)lipid species is more likely to be active when the respective epilipids are highly correlated. Consequently, a loss of high correlation between experimental conditions is a hint toward changes in reaction activity. Similar ideas have already been shown to generate novel insights into the regulation of epilipid biosynthesis. Lauder et al., 113 for example, found specific lipoxygenases to be important for coagulation and the thrombotic disorder antiphospholipid syndrome (APS) using an epilipid correlation-network approach.
One lipidomics-specific method for extensive network analysis is the Lipid Network Explorer, 114,115 abbreviated as LINEX. It is based on networks that combine fatty acid and lipid class metabolic reactions to compute species-level biochemical connections. Due to this modular structure, it is well-suited to cover epilipids by incorporating position-specific hydroxylation patterns and reactions.
To enable a quantitative analysis of lipidomics results, LINEX utilizes statistical metrics such as fold-changes and pvalues (between sample groups) as well as correlation metrics (between lipid species, per sample group). By projecting differences in sample groups onto the network, it allows the exploration of global changes in the lipidome. Since the network topology entirely depends on lipid metabolic reactions, this enables the identification of groups of reactions with distinct patterns between sample groups as well as generating hypotheses on alterations in enzymatic activity.
Epilipid metabolism pathways are partially searchable in databases (such as Rhea 116 and Reactome 117 ), and it is now possible to generate reasonable epilipidome metabolic networks. Nevertheless, additional efforts should be made to integrate the available information, enable the application of active module identification algorithms, and set the basis for metabolic modeling of the epilipidome. Furthermore, such networks will enable the integration of epilipidomics data with data from proteomics and other omics disciplines. We can therefore expect epilipidomics data interpretation to deliver novel insights into disease mechanisms and general biological processes.
Molecular Networking. Molecular networking (MN) is a relatively new computational strategy for the analysis and visualization of untargeted LC-MS data. 118 It relies on the fundamental assumption that molecules sharing similarities in Analytical Chemistry pubs.acs.org/ac Review their chemical structures also share fragment ion patterns when subjected to MS/MS fragmentation methods. Building on this hypothesis, MN creates networks of MS/MS spectral relations where structurally related molecules are connected together. Briefly, each MS/MS spectrum in a data set is compared pairwise against every other, and the spectral similarity is assessed using a modified cosine similarity score. Here, not only signals at identical m/z are taken into account, but also fragment ions that are offset by the same m/z difference as the precursors are considered as "matching" peaks. 119 Based on the calculated similarity scores, MS/MS data are organized into a network where each ion is represented as a node, and ions sharing an MS/MS spectral similarity above the user-defined threshold are connected by an edge. By doing so, molecules with similar chemical structures (e.g., differing by simple transformations such as oxidation/reduction, glycosylation, alkylation, etc.) appear as connected nodes and cluster into "molecular families". This tremendously assists the visualization of the detected chemical space and greatly facilitates the annotation of unknown, structurally related metabolites within a molecular network. Moreover, in combination with common feature-finding tools, semiquantitative information (peak area) can be mapped over each node in the network to infer semiquantitative differences among samples or sample groups. 119 Although originally developed for metabolomics data analysis to assist the identification of novel metabolites, 120 molecular networking has recently been applied in several (epi)lipidomics studies. 121 Machine Learning-Based Tools. Machine learning (ML) approaches to support the untargeted LC-MS data analysis workflow are becoming more and more popular. For example, ML-assisted algorithms have been developed for the automated peak picking and filtering of noise features based on chromatographic peak shape and intensity. 128−131 Another interesting application of ML-based methods is the prediction of RT 132 and ion mobility-derived CCS 133 values for metabolite identification purposes. The goal of these prediction models is the calculation of these values without the need of their experimental measurement (which require reference samples and/or standards). Notably, Zhou et al. developed a dedicated model for the prediction of CCS for lipids. 134 A further application of ML for metabolite annotation is represented by tools for the automated interpretation of MS/MS data, such as the SIRIUS software suite 135 and MSNovelist. 136 In particular, these methods offer the possibility of database-free predictions of elemental formulas and chemical structures that do not restrict the annotation to searching against structural databases (e.g., PubChem, LMSD) and/or matching against experimental MS/MS spectral libraries. This can be of particular help in the identification of truly new, unreported metabolites. Notably, the recently released SIRIUS 5 includes a dedicated module for lipid structure predictions. 137 Mass Spectrometry Search Tool and Mass Spectrometry Query Language. Deposition of untargeted MS data in the public domain is experiencing rapid growth, largely thanks to the increasing adoption of universal, nonvendor-specific MS data formats (e.g., mzML format). The two MS data mining tools described in this section, the Mass Spectrometry Search Tool (MASST) 138 and the Mass Spectrometry Query Language (MassQL), 139 have been recently developed by the Dorrestein lab with the aim of making the ever-growing untargeted MS data repositories (∼12,000 data sets comprising ∼7,500,000 files as of September 2022) an easily accessible resource to assist the annotation of unknown molecules and structural analogues.
The MASST tool is a web-based search engine for MS data. 138,140 It enables querying of MS/MS spectra across public spectral libraries (e.g., GNPS libraries, 118,140 all three MassBanks, 102 etc.) and repositories of metabolomics MS data (e.g., GNPS/MassIVE, 118 Metabolomics Workbench, 141 MetaboLights, 142 etc.). A specific MS/MS spectrum of interest can be searched by copy−pasting the spectrum peak list and defining a set of search tolerances within a user-friendly web interface. 140 A query result includes all matches to identical or analogous MS/MS spectra in public spectral libraries and repositories along with their associated metadata and/or sample information. In particular, metadata can be linked at the data set level (e.g., instrument type, taxonomy, keywords), file level (e.g., sample type, age, sex, body site, disease, etc.) or single annotated spectrum level (e.g., biological activity and structural class information).
MassQL is a novel query language for the mining of MS data. 139 Inspired by the SQL programming language, MassQL implements a consensus vocabulary to search for MS patterns using human-readable query strings. Searchable MS terms include both MS (e.g., precursor ion m/z, isotopic patterns) and MS/MS fragmentation patterns (e.g., diagnostic fragments and neutral losses), with support for both data-dependent (DDA) and data-independent acquisition (DIA). Additionally, terms for separation methods (i.e., retention time and ion mobility drift time), user-defined tolerances (e.g., ion intensity, mass accuracy), and boolean conjunctions (i.e., AND, OR) can be used to define inclusion/exclusion criteria and create more complex pattern queries. As an example, choline-containing phospholipids normally exhibit diagnostic fragments and neutral losses in the MS/MS spectra (positive ionization) arising from the glycerophosphocholine group: m/z 184.0739 and corresponding neutral loss of m/z 183.0733 (phosphocholine), m/z 125.0004 (2,2-dihydroxy-1,3,2-dioxaphospholan-2ium). 143 The following MassQL query string can be formulated to quickly retrieve all the MS/MS spectra that contain such diagnostic fragments/losses (within a 10 ppm mass tolerance) and likely belong to glycerophosphocholine lipids:

■ SUMMARY AND OUTLOOK
The growing interest in epilipidomics has boosted the development of ad hoc strategies for data analysis in recent years, although currently standard data analysis pipelines are missing. In this review, we first aimed at describing the available software for lipidomics endowed with a graphical user interface (to facilitate their use) and freely available for academic use. Though some workflows, which were originally developed for generic unmodified lipidomics or metabolomics, can be tweaked to analyze certain epilipids, epilipids-specific criteria still demand extra focus. From the availability of solid structure-and spectral libraries to the confidence of identification algorithms, data processing of epilipids has more challenges than well-developed metabolomics workflows. For example, due to the similarities of multiple isomeric species (e.g., two OH or one OOH) at different modification sites, the algorithms for the identification of exact epilipid structures and for accurate quantification of each isomer are required to implement multiple filters and rules that relate to specific epilipid classes and differ from common metabolomics algorithms, with the risk of becoming time-consuming. These unique challenges lead to adapting or developing computational tools for epilipidomics, which are suitable to deal with the intrinsic complexity of the epilipidome (intrinsic challenges) at different levels, as well as with the analytical challenges in the field. Our analysis brings us to believe that, although several steps forward have been made to develop software and computational approaches for epilipidomics, the currently available in silico tools still suffer some limitations due to experimental-derived and pure computational issues. From the experimental perspective, the lack of a large collection of standards for epilipidomics has a negative impact both on quantification and on the accurate definition of fragmentation rules or spectral matching strategies to be used in identification. Similarly, while CCS libraries are available for native lipids, the collection of CCS values for epilipids is still in its infancy. In silico predicted CCS values generated by epilipidspecific algorithms might be a potential starting point, but experimental CCS values for epilipids are needed to validate predictions. Finally, the analysis of the epilipidome needs accurate analytical procedures to reduce the risk of analyzing artifacts. Indeed, potential degradation products like oxidized lipids could form during sample processing if not correctly handled, and this will hamper any software's ability to deliver reliable results. 76 From the computational perspective, referring to the four steps of data analysis discussed in this review, lipid quantification and results investigation will require only minor improvements in the existing software, taking advantage from the experience in the field of native lipids. For example, the main bottleneck in lipid quantification is again the lack of proper standards. In addition, one needs to consider the specific changes of adduct distribution for epilipids, and the use of adduct clustering algorithms already available for native lipids have not yet been extensively validated for epilipids. Therefore, we still recommend reviewing the automatic assignment of lipid identification together with peak integration to make quantification reliable. Concerning the feature detection step, we described several tools that facilitate peak picking close to the noise level (e.g., using the gap-filling algorithms), which will be very beneficial for epilipidomics. It is noteworthy that, while LC-MS/MS data acquired in DDA mode are widely supported, only Lipostar, LipidMatch Flow, and MS-DIAL 4 support DIA mode such as All Ion Fragmentation (AIF), MS E , or SWATH. About the lipid identif ication step, we believe that large improvements are still possible to deal with the high number of isomers and isobars. So far, only LPPtiger 2 was specifically designed to identify epilipids from a number of classes (several phospholipids, cholesteryl esters, diacylglycerols, and triacylglycerols) with the aim of discriminating isomers and isobars with the drawback of being a time-consuming analysis, while the other software here discussed propose faster but less accurate alternative solutions. A first attempt of making a step forward in epilipid identification is the recent link between LPPtiger 2 and Lipostar 2 that allows having the accurate annotation of epilipids of the first but in a short time thanks to a preanalysis made by Lipostar 2. Similarly, we suggest that other software could connect to LPPtiger 2 in the near future. However, at this stage we still recommend manual rechecking of automatically identified epilipids to reduce misannotations and overreported species (due to adduct clustering failure) to overall improve the quality of epilipidomics data analysis. Indeed, taking into account the well-known issue of the lack of standards, the prediction of epilipids' properties such as fragmentation pattern, retention time, or CCS value could partially consolidate lipid identification. Machine learning tools offer the potential to predict CCS and RT values for small molecules with high accuracy, 144,145 but they have not been suitably explored for lipids or epilipids, where a high degree of rotatable bonds leads to a higher complexity and lower performance. Nevertheless, concerning RT prediction, elution order prediction could be preferred in a first attempt, as it provides more stable results. 146 The scattered solutions, each with strong focus in certain fields, call for joining the efforts through collaborative workflow connecting multiple tools. In addition, there is a trend to adapt new algorithms from metabolomics including molecular networking and machine learning to epilipidomics with additional adjustments. However, unlike several algorithms that can be applied to a large number of different metabolite classes across highly diverse structures, 147 it is still challenging to develop algorithms that can correctly distinguish epilipid isomers with high confidence.
Finally, currently available databases containing information about epilipids were also described. Databases containing structures of epilipids and/or related MS/MS libraries are still very limited and fragmented. Among the various sources, searching for the number of epilipids considered in a given database is still a challenging task, probably due to the limited attention that has been devoted in the past to epilipids. The lack of well-defined keywords, labeling systems, and, in primis, a common nomenclature, hampers the effort of collecting the available information. The use of a standard nomenclature is essential for sharing and comparing lipid annotations and to browse into databases. LipidLynxX has the merit of being a tool able to convert different annotations to unified identifiers based on a community-accepted shorthand notation system. 37,38 The computational community should work in connecting software to this tool to have a common nomenclature system (software such as BioPAN, 148